A Second Glance at The Weighted Sum. 


The weighted sum is often encountered in computer science and engineering. It is used as an informal 
alternative to the dot product and weight vector system it is equivalent to. Ultimately though, the dot 
product is the more richly developed mathematical concept. 


The weighted sum versus the dot product. 
Given 3 terms a=1, b=2, c=3 and 3 weights wo=3, W1=2 and w2=1 the weighted sum s is: 


S=a.Wo + b.Wi+ C.W2= 1.3 + 2.2+3.1=10 
The dot product of the vector x = <a, b, c> and the weight vector w = <Wo, Wi, W2> is really the same: 
X°W = a.Wo+ b.wit+ C.W2= 1.3+2.2 + 3.1=10 


Except now the weights are in an interpretable vector form which has length, direction and other 
properties. 


Summing and averaging. 
The weighted sum can be used to perform basic mathematical operations by setting all the weights to 1, 
1/n or to +1, -1. 


Sum =a.1+b.1+¢.1=6 

Average = a.(1/3) + b.(1/3) + c.(1/3) = 2 
a+b-c=a.(+1)+b.(+1) +c(-1)=0 

Which of course can be put in dot product form. 
Sum = xe<1, 1, 1>=6 

Average = xe<1/3, 1/3, 1/3> =2 


a+b-c=xe<+t1, +1, -1>=0 


The geometric definition of the dot product. 
The dot product can also be stated in geometric form. 


xew = || x III] w Ilcos(8) 


Where || x |] is the vector length of x and || w || is the vector length of w and @ is the angle between the 2 
vectors. 


For 2 vectors at 90 degrees to each other the dot product is always zero, since cos(90 degrees) = 0. 


Using the example values for x and w, the vector length of x is v( 12 + 22 +3 ) = V14, as is the vector 
length of w. The angle between them is approximately 0.775 radians. 


I] x III] w [lcos(0.775) = 14*cos(0.775) = 10. 


As before with the algebraic formula. 


You can work out the angle using the algebraic formula for the dot product and the inverse cosine 
(arccos) function. 


© = arccos( xew / || x II w Il) 
© = arccos( 10 / 14) = 0.775 radians. 


As a further example, the angle between x and <1,1,1> is arccos( 6 / ( V14 * v3 ) ) = 0.388 radians = 22.2 
degrees, where 6 happens to be the sum of the elements of x. 


Compositions of weighted sums. 
The weighted sum of a number of weighted sums can be simplified down. 


Ifr=1.y+2.zands=2.y+1.zandt =3.r+4.s then t can be simplified. 
t = 3.(1.y + 2.z) + 4.(2.y + 1.z) =11.y + 10.z 

Likewise, compositions of dot products can be simplified. 

t= <<y, z>e<1, 2>, <y, z>e<2, 1>>0e<3, 4> 

t=11.y+10.z 


A point to watch out for with certain neural networks. 


The variance equation for linear combinations of random variables. 
If X is arandom variable and a, b are non-random constants, the expectation, standard deviation and 
variance of aX+b is: 


E(aX+b) = aE(X)+b 

SD(aX+b) = |a| SD(X) 

Var(aX+b) = a2Var(X) 

If X1, X2...X, are independent random variables then, 

Var[ ciX1 + CoX2 +... + CnXn ] = Cx2Var[Xi] + C22Var[X2] +... + Cn2Var[Xn] 


This allows you to calculate the effect of independent noise on the input terms to a weighted sum or to 
a dot product with a weight vector. 


Information storage. 
If you want a dot product to result in the value 1 and you have an input vector with independent noise 
present in all its terms, you can make one input term 1 and one weight term 1. 


1+noise = <0+noise, 1+noise, 0+noise>e<0O, 1, O> 

That cuts out most of the noise in the input, leaving only the noise variance of a single term. 
Alternatively, you can make all the input terms 1 and all the weight terms 1/n. 

1+ne = <1+noise, 1+noise, 1+noise>e<1/3, 1/3, 1/3> 

The noise variance is reduced to 3((1/3)?) = 1/3. 

You can say averaging is better than cutting. 

In both cases the angle between the input vector and the weight vector is zero. 


If you increase the angle between the input vector and the weight vector toward 90 degrees (by altering 
the weight vector) and you still want to get the value 1 out, then the length of the weight vector must 
increase to compensate for the decreasing value of cos(6). And the output become very noisy as a 
result, which you can observe from the statistical formula. 


The conclusion is the dot product prefers fully distributed (non-sparse) input and weight vectors and 
least angles to produce the lowest noise output. 


If you store a single vector key and scalar value [k:, vi] association using a dot product with a weight 
vector (vi = kiew), the angle between key and the weight vector will equal zero using any reasonable 
solving algorithm. 


If you store 2 associations the angles between the key vectors and weight vector can no longer be zero 
in general. The more associations you store the greater the average angle between the key vectors and 
the weight vector. The length of the weight vector must increase to compensate for the increasing 
angles and the system becomes more sensitive to noisy inputs. 


A further requirement to successfully store information using a dot product with a weight vector is the 
independence of the keys. The keys should be independent to allow the system of linear equations 
involved to be solved. 


If the keys don’t automatically meet the requirements there are a number of different preprocessing 
functions ( Knew = f(Koriginal) ) available. 


Some examples of are: 

Random Projections (distribution, some generalization) 

Hashing (distribution, approximate independence.) 

Locality Sensitive Hashing (distribution, approximate independence, some generalization.) 


Random Projection followed by amplitude sensitive non-linearity. (distribution, independence, some 
generalization.) 


If you use hashing then ideally the bits should be viewed as —1, +1 not 0, 1. 

There are 3 capacity cases, 

Under capacity: Recall with reduction in input noise (biased toward the weight vector.) 
At capacity: Recall. 


Over capacity: Recall with added Gaussian noise. 


You can also study information storage in the weighted sums of conventional neural networks using the 
same concepts of distribution and angle to the weight vector. 
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